Reducing Power Consumption in a Server Cluster

ABSTRACT

A method of reducing power consumption of a server cluster of host systems with virtual machines executing on the host systems is provided. The method includes recommending host system power-on when there is a host system whose utilization is above a target utilization, and recommending host system power-off when there is a host system whose utilization is below the target utilization. Recommending host system power-on includes calculating impact of powering on a standby host system with respect to reducing the number of highly-utilized host systems in the server cluster. Recommending host system power-off includes calculating impact of powering off a host system with respect to decreasing the number of less-utilized host systems in the server cluster.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No.12/557,284 filed on Sep. 10, 2009, issued as U.S. Pat. No. 9,047,083,which claims the benefit of U.S. Provisional Application No. 61/096,909,filed on Sep. 15, 2008, the contents of which are incorporated herein byreference in their entirety.

TECHNICAL FIELD

One or more embodiments of the present invention relate generally tovirtual machines executing on server clusters, and more particularly, toreducing power consumption in such server clusters.

BACKGROUND

Computer virtualization is a technique that involves encapsulating aphysical computing machine platform into a virtual machine that isexecuted under the control of virtualization software on a hardwarecomputing platform. Virtualization software enables multiple virtualmachines to be run on a single hardware computing platform, and canmanage the allocation of computing resources to each virtual machine inaccordance with constraints and objectives.

A set of hardware computing platforms can be organized as a servercluster to provide computing resources for example, for a data center.In addition, supporting technology can move running virtual machinesbetween servers in the cluster; an example of this supporting technologyis sold as VMware VMotion™ by VMware, Inc. of Palo Alto, Calif. Inaddition, server cluster virtualization management software thatincorporates cluster resource management technology can determineinitial and ongoing locations of virtual machines on hardware computingplatforms in the server cluster, and can manage the allocation ofcluster computing resources in accordance with constraints andobjectives. An example of this server cluster virtualization managementsoftware is sold as VMware Distributed Resource Scheduler™ by VMware,Inc. of Palo Alto, Calif. In addition, the server cluster virtualizationmanagement software can request that a server in the cluster poweritself down, and can use mechanisms available in the marketplace toremotely power-on a server that is powered down. An example of thispower management software is sold as the VMware Distributed PowerManagement feature within the VMware Distributed Resource Scheduler byVMware, Inc. of Palo Alto, Calif.

Server clusters consume significant power. The cost of that power is amajor expense in operating a server cluster, and generating that powercan have an environmental impact.

SUMMARY

In one embodiment, a method of reducing power consumption of a servercluster of host systems with virtual machines executing on the hostsystems is disclosed. The method includes recommending host systempower-on when there is a host system whose utilization is above a targetutilization, and recommending host system power-off when there is a hostsystem whose utilization is below the target utilization. Recommendinghost system power-on includes calculating impact of powering on astandby host system with respect to reducing the number ofhighly-utilized host systems in the server cluster. The impact ofpowering on is calculated by simulating moving some virtual machinesfrom highly utilized host systems to the standby host system beingrecommended to be powered on. Recommending host system power-offincludes calculating impact of powering off a host system with respectto decreasing the number of less-utilized host systems in the servercluster. The impact of powering off is calculated by simulating movingsome or all virtual machines from the host system, which is beingrecommended to be powered-off, to less-utilized host systems. In thepreferred embodiment, all running virtual machines are moved of a hostbefore powering the host off (or simulating powering the host off). Inanother embodiment, one or more selected classes of VMs that aredesignated as being OK to leave on the host and power off along with thehost, are not moved or factored in the power off simulationcalculations. Therefore, in one embodiment, the term “moving all VMs”means either moving all running VMs or moving all running VMs but one ormore selected classes of VMs that are designated as being OK to leave onthe host during the host power off.

In another embodiment, a system for reducing power consumption of aserver cluster of host systems with virtual machines executing on thehost systems is disclosed. The system includes a cluster managementserver to manage the server cluster, the cluster management serverincluding a distributed resource scheduling (DRS) module to manageallocation of resources to the virtual machines running on the servercluster and a distributed power management (DPM) module coupled to theDRS module to recommend powering-on of powering-off a host system in theserver cluster to save power.

In yet another embodiment, a computer readable media having programinstructions for reducing power consumption of a server cluster of hostsystems with virtual machines executing on the host systems isdisclosed. The computer readable media includes program instructions forrecommending host system power-on when there is a host system whoseutilization is above a target utilization, and recommending host systempower-off when there is a host system whose utilization is below thetarget utilization. Program instructions for recommending host systempower-on includes program instructions for calculating impact ofpowering on a standby host system with respect to reducing the number ofhighly-utilized host systems in the server cluster, the impact ofpowering on is calculated by simulating moving some virtual machinesfrom highly utilized host systems to the standby host system beingrecommended to be powered on. Program instructions for recommending hostsystem power-off includes program instructions for calculating impact ofpowering off a host system with respect to decreasing the number ofless-utilized host systems in the server cluster, the impact of poweringoff is calculated by simulating moving some or all virtual machines fromthe host system, which is being recommended to be powered-off, toless-utilized host systems.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a pictorial representation of a computer system that includesa server cluster in accordance with one or more embodiments of thepresent invention; and

FIG. 2 is a block diagram representing an example of a host systemincluded in the server cluster shown in FIG. 1.

DETAILED DESCRIPTION

One or more embodiments of the present invention are a method,machine-readable medium, and a system for reducing power consumption ofa server cluster. In particular, one embodiment is a method of reducingpower consumption of a server cluster of host systems with virtualmachines executing on the host systems, the method comprising:considering recommending host system power-on when there is a hostsystem whose utilization is above a target utilization range, andconsidering recommending host system power-off when there is a hostsystem whose utilization is below the target utilization range; whereinconsidering recommending host system power-on comprises iterating asfollows: for each host system, determining utilization as the ratio ofdemand to capacity for the host system, and if the utilization for anyhost system is over a target utilization, iterating through standby hostsystems by determining a “what if” plan assuming the standby host systemwas powered on, and quantifying an impact of powering on the standbyhost system by determining a sum of a weighted distance above the targetutilization for each host system above the target utilization, assumingthe standby powered off host system is powered on and with the standbyhost system powered off, and if the sum improves with the standby hostpowered on, recommending that the standby host system be powered on; andwherein considering recommending host system power-off comprisesiterating as follows: for each host system, determining utilization, andif the utilization for any host system is under a target utilization,iterating through powered on host systems by determining a “what if”plan assuming the powered on host system was powered off, andquantifying an impact of powering off the host system by determining asum of a weighted distance below the target utilization for each hostsystem below the target utilization, assuming the powered on host systemis powered on and with the powered on host system powered off, and ifthe sum improves with the powered on host system powered off and the sumof target utilizations above the target utilization is not worse thanthat with the host system kept powered on, recommending that the hostsystem be powered off.

FIG. 1 is a pictorial representation of computer system 10 that includesserver cluster 20 in accordance with one or more embodiments of thepresent invention. As shown in FIG. 1, server cluster 20 includes aplurality of hardware computing platforms 11-19 (also referred to hereinas host systems 11-19) that are grouped or clustered together(physically or logically). Although only nine host systems 11-19 areshown in FIG. 1, in practice, server cluster 20 may include an arbitrarynumber of host systems. As further shown in FIG. 1, server clustervirtualization management software 21 runs on cluster management server24. Server cluster virtualization management software 21 includes userinterface 26 and is in data communication with each of host systems11-19. User interface 26 facilitates data communication with servercluster virtualization management software 21 to enable a user tocontrol operations of server cluster 20, as is discussed more fullybelow.

FIG. 2 is a block diagram representing an example of a host systemincluded in server cluster shown in FIG. 1. Referring to FIG. 2, each ofhost systems 11-19 includes physical hardware and virtualizationsoftware. The physical hardware, referred to here as host hardware, isstandard to computer systems, and may include one or more CPU(s) 32,physical memory 34, disk drives 36, memory management unit (MMU) 38, aswell as conventional registers (not shown), interrupt-handling circuitry(not shown), a clock (not shown), etc. Running on the physical hardwareis hypervisor software 40, including software drivers 44 whichfacilitate communication with various physical input/output devices 46.

As further shown in FIG. 2, virtual machines (VMs) 50 running on thehost hardware of host systems 11-19. In operation, any number of VMs 50may be present. As is well known, each VM 50 is provided with aninterface representing a complete physical computer system, whichinterface is implemented using host hardware and virtualizationsoftware. In particular, each VM 50 is presented with guest systemhardware 51 that may have one or a plurality of virtual CPUs 52 (VCPU52), virtual system memory 53 (VMem 53), virtual disks 54 (VDisk 54),and other virtual devices 55 (VDevice 55). In addition, each VM 50includes guest system software 56 that may include guest operatingsystem 57 (guest OS 57) which may, but need not, be a copy of aconventional, commodity OS, as well as drivers 58 that, for example,control VDevice(s) 55. Each VM 50 may have one or more applications 60installed to run on guest OS 57; any number of applications, includingnone at all, may be loaded for running on guest OS 57, the number beinglimited only by the requirements of each VM 50.

Virtual machine monitor (VMM) 62 is an interface between each VM 50 andthe host hardware that is responsible for allowing execution of, or forexecuting, VM-related instructions, and for mapping guest memory to hostmemory 34. VMM 62 is a layer of software that runs directly on the hosthardware in privileged mode. VMM 62 may include device emulators 64,which may form an implementation of guest system hardware 51. VMM 62handles faults and interrupts engendered by or delivered for each VM 50.For simplicity of illustration. VM 50 and VMM 62 are shown as separatesoftware: however, the combination of VM 50 and VMM 62 may be viewed ascomprising a running virtual machine instance. VMM 62 may forward tohypervisor system software 40 requests by a VM 50 for machine resources.Also, VMM 62 may request hypervisor system software 40 to perform I/O bycalling software drivers 44.

Referring again to FIGS. 1 and 2, one function of server clustervirtualization management software 24 is to facilitate transfer of VMs50 among host systems 11-19 in an automated fashion. As shown in FIG. 1,server cluster virtualization management software 24 includesDistributed Resource Scheduler (DRS) module 72 and Distributed PowerManagement (DPM) module 74. Transfer of VMs 50 among host systems 11-19is also referred to as VM migration. All VMs 50 being migrated from oneof host systems 11-19 is referred to as evacuation of the host system.DRS module 72 manages computational resources of server cluster 20 andtheir allocation to each VM 50 executing on host systems 11-19.Specifically, each host system 11-19 has computational resourcesassociated therewith that are measured, for example and withoutlimitation, in terms of CPU cycles and memory bytes capacity available.In addition, the VMs 50 on each host system 11-19 have defined resourcerequirements, and place variable resource demands on the computationalresources associated with host systems 11-19. DRS module 72 may: (a)power-on additional host systems 11-19, if available and if needed tosupport the resource constraints of VMs 50; and (b) perform a loadbalancing function involving migrating VMs 50 among powered-on hostsystems 11-19 of server cluster 20. To address the resource constraintsof the VMs 50, DRS module 72 ascertains whether the powered-on capacityof server cluster 20 is sufficient to satisfy the resource constraintsof all VMs 50. If the resource constraints cannot be satisfied by thepowered-on host systems, DRS module 72 may identify, in server cluster20, host systems 11-19 that are in a powered-down state and which couldaddress violations of resource constraints. For such host systems, DRSmodule 72 signals through a data communication channel (not shown) tothose host systems to power-on, and DRS module 72 requests subsequenttransfer of VMs 50 to the newly powered-on ones of host systems 11-19,thereby revising a current on/off configuration (COC) of server cluster20 to a new on/off configuration (NOC). After addressing any resourceconstraint violations, DRS module 72 further considers moving VMs 50among host systems with an objective of better load balance among hostsystems to improve delivery of resources. An example of a suitable DRSmodule 72 is available as VMware Distributed Resource Scheduler fromVMware. Inc. of Palo Alto, Calif. which manages the allocation ofresources to a set of VMs running on a cluster of host systems, givenresource-based Service Level Agreements and system- and user-specifiedconstraints. Server cluster virtualization management software 24 mayalso include “high availability” software (HA) that handles host systemand VM failures in a server cluster given a specification of desiredpolicies and of associated resources to be set aside for use by VMs inthe event of a failure. As such, HA implements mechanisms for detectingproblems and restarting VMs. An example of suitable HA software isavailable as VMware High Availability from VMware, Inc. of Palo Alto,Calif.

To reduce power consumption of server cluster 20, DRS module 72 includesDistributed Power Management (DPM) module 74 that is invoked after DRSmodule 72 addresses the constraints and objectives described above. DPMmodule 74 functions to regulate the on/off configuration of servercluster 20 so that a desired level of computational performance withreduced power consumption may be established and/or maintained. This isachieved by DPM module 74 computing the utilization of each host system11-19 in server cluster 20 to derive information about any of hostsystems 11-19 that are highly-utilized and any that are lightly-utilizedby VMs 50 executing thereon. In general, DPM module 74 saves power in acluster of server hosts by consolidating virtual machines onto fewerhosts and powering hosts off during periods of low resource utilization,and powering hosts back on for virtual machine use when workload demandsincrease. In particular, DPM module 74 saves power in a cluster byrecommending evacuation and power-off of hosts when both CPU and memoryresources are lightly utilized. DPM module 74 recommends powering hostsback on when either CPU or memory resource utilization increasesappropriately or host resources are needed to meet other user-specifiedconstraints. DPM module 74 leverages the capability of executing DRSmodule 72 in a “what-if” mode to ensure its host power recommendationsare consistent with cluster constraints and objectives being managed byDRS module 72. The reason that DPM module 74 chooses to evacuate hostsystems and power them down is that host systems typically burn 60% ormore of their peak power when totally idle, so the power savingspossible with this approach are substantial. Once DPM module 74 hasdetermined how many host systems need to remain powered on to handle theload and to satisfy all relevant constraints, and DRS module 72 hasdistributed VMs across the host systems in keeping with resourceallocation constraints and objectives, each individual host system isfree to power-manage its hardware to run the presented load efficiently,without any need for DPM module 74 involvement. Thus, DPM module 74 cansave power in server cluster 20 when there are periods of lowutilization of cluster resources, and DPM module 74 operates in concertwith DRS module 72 constraints and HA constraints, if any, saving powerwhile ensuring the availability of powered-on resources to satisfy, forexample, Service Level Agreements.

In accordance with one or more embodiments of the present invention, DPMmodule 74 can be enabled or disabled at the cluster level. When enabledfor a server cluster, DPM module 74 can operate in manual mode, in whichexecution of DPM module 74 recommendations requires confirmation by auser, or in automatic mode, in which DPM module 74 recommendations areexecuted without user confirmation. In addition, DPM can be set asdisabled, manual, or automatic on a per-host basis; per-host settingsapply only when DPM module 74 is enabled for the cluster. Variousdefault settings of DPM are intended to support performance andpower-efficient use of cluster resources, and may be changed by theuser.

DPM module 74 Operation:

As set forth above, the goal of DPM module 74 is to keep utilization ofhost systems in a server cluster within a target range, subject toconstraints specified by DPM operating parameters and those associatedwith DRS, and, optionally, HA. To do this, DPM module 74 considersrecommending host system power-on operations when there are host systemswhose utilization is above this range and host system power-offoperations when there are host systems whose utilization is below it. Inaccordance with one or more embodiments of the present invention, DPMmodule 74 is run as part of a periodic (for example and withoutlimitation, every 5 minutes by default) invocation of DRS module 72,immediately after DRS module 72 cluster analysis and rebalancingcompletes. DRS module 72 itself may recommend host power-on operations,if needed, as a prerequisite for migration recommendations to address HAor DRS constraint violations, to handle user requests involving hostevacuation, or to place VMs on hosts for power-on.

DPM Module 74 Method for Evaluating Host Utilization:

DPM module 74 evaluates the CPU and memory resource utilization of eachhost system and aims to keep each host system's resource utilizationwithin a rangeDemandCapacityRatioTarget+/−DemandCapacityRatioToleranceHost; whereconfigurable parameter DemandCapacitRatioTarget is a DPM module 74per-host utilization target, for example and without limitation, adefault is 63%, and configurable parameterDemandCapacityRatioToleranceHost is a DPM module 74 per-host tolerancearound its target utilization, for example and without limitation, adefault is 18, meaning a default utilization range is 63+/−18=45% to81%. Each host system's resource utilization is calculated asdemand/capacity, where demand is the total amount of CPU or memoryresource needed by VMs currently running on the host system, andcapacity is the total amount of CPU or memory resource currentlyavailable on the host system for use by running VMs. A VM's demandincludes both its actual usage and an estimate of its unsatisfieddemand. This compensates for cases in which a demand value isconstrained by host system resources currently available to the VM. Notethat if a host system resource is heavily contended, its utilization canexceed 100%.

DPM module 74 calculates each host system's demand as a sum, across thehost system's running VMs, of each VM's average demand over anhistorical period of interest plus a configurable number of standarddeviations (with the sum capped at the VM's maximum demand observed overthe period). The configurable number VmDemandHistoryNumStdDevAboveAve ofstandard deviations above the average demand over the period in questionthat DPM module 74 uses in considering demand in its utilizationcomputation could have, for example and without limitation, a default of2. Using a VM's average demand over a period of interest, rather thansimply its current demand, is intended to ensure that the demand used isnot anomalous. The period of interest DPM module 74 considers withrespect to: (a) evaluating demand that may lead to host power-on is thelast VmDemandHistorySecsHostOn seconds; where configurable parameterVmDemandHistorySecsHostOn is the period of demand history DPM module 74uses with respect to considering host power-on to address highutilization, for example, a default is 300 seconds or 5 minutes); and(b) evaluating demand that may lead to host power-off is the lastVmDemandHistorySecsHostOff seconds; where configurable parameterVmDemandHistorySecsHostOff is the period of demand history DPM module 74uses with respect to considering host power-off to address lowutilization, for example and without limitation, a default is 2400seconds or 40 minutes. In accordance with one or more embodiments of thepresent invention, the shorter default history period considered forhost power-on is chosen so that DPM module 74 responds relativelyrapidly to increases in composite VM demand, while the longer defaulthistory period considered for host power-off is chosen so that DPMmodule 74 responds relatively slowly to decreases in composite VMdemand. Computing VM demand using a configurable number of standarddeviations above its average demand is intended to provide significantcoverage of the probable range of the demand, based on observed pastdemand during the period of interest.

If any host system's CPU or memory resource utilization over the periodconsidered with respect to host power-on is above the target utilizationrange, DPM module 74 considers powering host systems on. If any hostsystem's CPU and any host system's memory resource utilization over theperiod considered with respect to host power-off is below the targetutilization range, DPM module 74 considers powering host systems off,when host systems are not already being recommended for power-on.

DPM Module 74 Method for Ensuring Host Capacity is Powered-on whenNeeded to Address VM Demand:

If the host resource utilization evaluation described above leads DPMmodule 74 to consider recommending host power-on operations to addresshigh utilization, DPM module 74 iterates through standby host systems,i.e., host systems powered off, in a sorted order (described below). Foreach standby host system, DPM module 74 invokes DRS module 72 in a“what-if” mode to rebalance the VMs across host systems in the clustersystem, assuming that host system were powered-on. To quantify theimpact of powering on a standby host system with respect to reducing thenumber of highly-utilized host systems in the server cluster and/or todiminishing their distance above the target utilization, DPM module 74computes for each resource a score denoted highScore as a sum of theweighted distance above the target utilization for each host systemabove that target. DPM module 74 compares the value of highScore for theserver cluster without the host system powered-on with that calculatedfor the system cluster via DRS module 72 “what-if” mode run with thehost system powered-on. If the associated value of highScore is stablyimproved for the server cluster with the standby host system powered-on,DPM module 74 generates a power-on recommendation for the host system.Note that in accordance with one or more embodiments of the presentinvention, in comparing highScore values, if the memory resource isovercommitted on host systems in the server cluster, DPM module 74 willgive reduction in memory utilization higher importance than it givesimpact on CPU resources. DPM module 74 continues to iterate through thestandby host systems for power-on consideration, as long as there areany host systems in the server cluster exceeding the target utilizationrange for either CPU or memory resources. In accordance with one or moresuch embodiments, DPM module 74 will skip with respect to power-onconsideration any standby host systems that are equivalent (in terms ofVMotion compatibility and of having the same or fewer CPU and memoryresources) to any host systems already rejected for power-on based onthe DRS module 72 “what-if” evaluation during this round of iterativeconsideration.

DPM module 74 then recommends powering on any additional host systemsneeded to reach a minimum amount of powered-on CPU or memory resources.For example, this may be the maximum of any values specified by HA,optionally set by the user, or defined by default. In accordance withone or more embodiments of the present invention, specifying a minimumamount of powered-on capacity is not required since DRS module 72/DPMmodule 74 will recommend that appropriate host systems be powered-onwhen needed and will keep host systems powered-on to respect any HAfailover settings. Further, in accordance with one or more furtherembodiments of the present invention, one can specify that a particularminimum amount of CPU and/or memory capacity be kept powered-on, evenwhen that capacity is not deemed necessary by DRS module 72/DPM module74. Note that the host capacity kept powered-on to satisfy thesesettings is not necessarily compatible with the future needs of somearbitrary VM (for example, it may not match the required CPUcharacteristics), so these settings are most useful in server clustersof similar host systems that are compatible with the majority of VMs.Configurable parameter MinPoweredOnCpuCapacity is the minimum amount ofpowered-on CPU capacity in MHz to be maintained by DPM module 74, forexample and without limitation, a default is 1 MHz; and configurableparameter MinPoweredOnMemCapacity, is the minimum amount of powered-onmemory capacity to be maintained by DPM module 74, for example andwithout limitation, 1 MB. Note that at least one host system in theserver cluster is kept powered-on, and that host systems powered-onsolely to reach a specified minimum amount of CPU or memory resourcesare not needed to accommodate VMs currently running in the servercluster, and may be idle.

DPM Method for Determining when Host Capacity is Excess and can bePowered-Down:

If the host resource utilization evaluation described above leads DPMmodule 74 to consider recommending host system power-off to address lowutilization, DPM module 74 iterates through the powered-on host systemsin the sorted order described below. For each powered-on host system,DPM module 74 invokes DRS module 72 in a “what-if” mode to rebalance theVMs across the host systems in the server cluster, assuming that thehost system were powered-off. To quantify the impact of powering off ahost system with respect to reducing the number of lightly-utilized hostsystems in the server cluster and/or to diminishing their distance belowthe target utilization, DPM module 74 computes for each resource a scoredenoted lowScore as a sum of the weighted distance below targetutilization of all host systems below that target. DPM module 74compares the value of lowScore for the server cluster without the hostsystem powered-off with that calculated on the server cluster via theDRS module 72 “what-if” mode run with the host system powered-off. Ifthe associated value of lowScore is improved with the host systempowered-off and if the value of highScore described above for theresulting server cluster is not worse than that with the host systemkept powered-on, DPM module 74 generates a recommendation to power-offthe host module 74, along with recommendations for any neededprerequisite migrations of VMs off of that host system. DPM module 74continues to iterate through the powered-on host systems for power-offconsideration, as long as the server cluster contains any host systemsbelow the target utilization range for CPU resources and any hostsystems below the target utilization range for memory resources.

In accordance with one or more embodiments of the present invention,several additional factors are also considered with respect to placing ahost system in standby. One factor is that DPM module 74 will notrecommend any host system power-off operations (and hence DPM module 74is effectively disabled) if a DRS module 72 migration is set so it willnot produce any non-mandatory recommendations to move VMs to those hostsystems. A second factor is that DPM module 74 rejects powering down ahost system if its entering standby would take the powered-on capacityof the server cluster below the specified minimum (described above). Anda third factor is that DPM module 74 chooses not to power down a hostsystem if the conservatively-projected benefit of placing that hostsystem into standby does not exceed by a specified multiplier thepotential risk-adjusted cost of doing so, as described in cost/benefitanalysis below.

Host System Power-Off Cost/Benefit Analysis:

Host system power-off has a number of potential associated costs,including the cost of migrating any running VMs off of the associatedhost system, the loss of the host system's resources during power-down,power consumed during the power-down period, the loss of performance ifthe host system's resources become needed to meet demand while the hostsystem is powered off, the loss of the host system's resources duringits subsequent power-on operation, the power consumed during thepower-up period, and the costs of migrating VMs back onto the hostsystem. For each host system considered for power-off, DPM module 74compares these costs (taking into account an estimate of theirassociated risks) with a conservative projection of the power-savingsbenefit that will be obtained by powering off the host system in ananalysis step called DPM power-off cost/benefit.

DPM module 74 power-off cost/benefit calculates StableOfTime, which isthe time a host system is powered-off and unlikely to be needed; thepower saved during this time represents a risk-adjusted conservativebenefit of powering the host system down. The time it takes to power offa host system is computed as a sum of the time to evacuate VMs currentlyrunning on that host system (HostEvacuationTime) and the subsequent timeto perform an orderly shutdown of the host system (HostPowerOffTime).The time at which a host system becomes likely to be needed is denotedas ClusterStableTime and is conservatively computed as a configurablepercentile value of the running VMs' demand stable times, based on thecoefficient of variance of the demand of each. DPMmodule 74 power-offcost/benefit sorts the running VMs' demand stable times (based on thecoefficient of variance of the demand of each) in ascending order. Theconfigurable parameter PowerPeformancePercentileMultiplier is thepercentile point within this list that is selected as an estimate of thetime at which all VMs in the server cluster are projected to jump to ahigh demand level suggested by their history and it has, for example andwithout limitation, a default of 10. Hence, StableOffline is computed asClusterStableTime−(HostEvacuationTime+HostPowerOffTime). At the end ofClusterStableTime, the demand for each VM is conservatively assumed torise to a high level, which is computed as the mean of its demand overthe PowerPerformanceHistorySecs seconds (a configurable parameterrepresenting the period of demand history considered by DPM module 74power-off cost/benefit, for example and without limitation, a default is3600 seconds) plus PowerPerformanceVmDemandHistoryNumStdDev standarddeviations (a configurable parameter representing the number of standarddeviations above the average demand over the period in question that DPMmodule 74 power-off cost/benefit uses in computing its conservative highdemand point, for example and without limitation, a default is 3). DPMmodule 74 rejects a host system for power-off if StableOffTime iscomputed as less than or equal to 0.

With respect to host systems for which this StableOffTime benefit periodis greater than 0, DPM module 74 compares the host module 74 power-offbenefit to its cost, both expressed in terms of resources as the commonunit. Power-off benefit is computed as the resource capacity saved(i.e., powered-off) during StableOffTime. Power-off cost is calculatedas the resource costs of migrating VMs off of this host system prior topower-off, the expected resource costs of migrating VMs back onto thishost system when the conservatively-projected high demand occurs, andany associated performance impact in terms of unsatisfied resourcedemand for the period during which a needed host system is being broughtout of standby. DPM module 74 cost/benefit rejects a potential hostsystem power-off recommendation unless the benefit is greater than orequal to the cost multiplied by PowerPerformanceRatio for all resources(PowerPerformanceRatio is a configurable parameter that represents themultiplier by which benefit must meet or exceed performance impact, forexample and without limitation, a default of 40).

Sort Order in which DPM Module 74 Considers Host Systems for PotentialPower-on or Power-Off:

With respect to both power-on and power-off operations, host systems inDPM module 74 automatic mode are considered before host systems in DPMmodule 74 manual mode. Host systems at the same DPM module 74 automationlevel are considered in order of capacity with respect to the morecritical resource (CPU or memory) and then with respect to the otherresource; hence, larger capacity host systems are favored for power-onand smaller for power-off Host systems at the same automation level andcapacity are considered for power-off in order of lower VM evacuationcost. For ties with respect to the previous factors, host systems areconsidered for power-off in randomized order, to spread the selectionacross host systems for a wear-leveling effect. Other factors may beconsidered in determining host system ordering for power-on or power-offconsideration such as, for example and without limitation, host systempower efficiency.

Note that the order in which host systems are considered by DPM module74 does not determine the actual order in which host systems areselected for power-on or power-off. As explained previously. DPM module74 invokes DRS module 72 in a “what-if” mode for each candidate hostsystem, and there are a number of reasons why a candidate host systemmay be rejected, based on DRS module 72 operating constraints andobjectives. For host power-off. some example situations limiting hostselection include constraints leading to an inability to evacuate allVMs from a candidate host or cases in which VMs to be evacuated are onlymoveable to host systems that will then become (more) heavily utilized.For host power-on, some example situations limiting host selectioninclude constraints such that no VMs would move to a host if it werepowered-on or such that the VMs that would move to a candidate host arenot expected to reduce load on the highly-utilized hosts in the cluster.In addition, DPM module 74 will not strictly adhere to its host sortorder if doing so would lead to choosing a host with excessively largercapacity than needed, if a smaller capacity host that can adequatelyhandle the demand is also available.

DPM module 74 host system power recommendations are assigned ratings,signifying their expected importance given the current utilization ofhost systems in the server cluster, and any constraints on powered-oncapacity. Host system power-on recommendations are rated, for example,as 3 to 5. Power-on recommendations generated to meet any HA or optionaluser-specified powered-on capacity requirements receive a rating of 5.Power-on recommendations produced to address high host utilization arerated as 3 or 4, with the higher number meaning that host systemutilization is closer to saturation. Host power-off recommendations arerated as 1 to 4. A higher rating for power-off signifies a larger amountof unused but powered-on capacity in the cluster, and hence a moreattractive opportunity for power savings given the powered-on resourceheadroom. These ratings could also be expressed as priorities, e.g.,with a priority of 1 being equivalent to a rating of 5.

DPM module 74 recommendation ratings are compared to a configured DPMmodule 74 recommendation threshold (for example, from 1 to 5) where DPMmodule 74 discards recommendations below the threshold. For example, aDPM recommendation threshold of is 1 means all DPM module 74recommendations meet the threshold.

In light of the above, one can readily appreciate that, in accordancewith one or more embodiments of the present invention, DPM module 74periodically compares demand for computational resources with availablecomputational capacity of powered-on host systems. If the demand tocapacity ratio is too high (for example, compared to a predetermined oruser set parameter) on any host system in server cluster 20, DPM module74 asks DRS module 72 to produce a “what-if” plan for server cluster 20,assuming a particular powered-down host system were available andpowered-on. If that plan reduces high host utilization, DPM module 74accepts the “what if” plan, and continues. DPM module 74 iterates inthis fashion, as long as it detects high utilization on any host systemin a configuration of server cluster 20 that includes ones to bepowered-on from previous steps. After that, DPM module 74 determines ifthe demand to capacity ratio is too low (for example, compared to apredetermined or user set parameter) on any host system in servercluster 20, DPM module 74 asks DRS module 72 to produce a “what-if” planfor evacuating VMs from a specified powered-on host system and utilizingthe remaining host systems more fully, in accordance with all relevantresource allocation, performance, and high availability attributes. Ifsuch a plan can be produced that ameliorates low host utilizationwithout resulting in high host utilization and meets cost/benefitcriteria concerning performance impact risk versus power savings, DPMmodule 74 accepts the “what if” plan, and continues. DPM module 74iterates in this fashion through available powered-on host systems, aslong as it detects low utilization on any host system in server cluster20. Note that, in accordance with one or more embodiments of the presentinvention, considering demand to capacity ratio on a per host systembasis allows handling a case in which host systems in server cluster 20are not homogeneous in size or configuration, meaning that some hostsystems may be highly utilized, even when server cluster 20 is not so inan overall sense. Also note that a calculation of demand for purposes ofdetermining utilization can be based on current, historical, andpredicted data, in accordance with parameters that may be modified bythe user.

In accordance with one or more embodiments of the present invention, DRSmodule 72 is run at a periodic time interval, for example, every fiveminutes, which is consistent with an ability to meet cluster managementobjectives relating, for example and without limitation, to allocationof cluster computing resources. The time interval may be set by a useras desired, and/or DRS module 72 may be invoked aperiodically, inreaction to user input or in reaction a cluster related change such as,if a host system fails.

The following is pseudo-code for a method of considering host systempower-on and power-off operations in server cluster 20 in accordancewith one or more embodiments of the present invention. In thepseudo-code, targetUtilization is demand/capacity desired. For example,this is a user defined parameter. In particular, in accordance with oneor more embodiments of the present invention, targetUtilization is, forexample and without limitation, 63%. In the pseudo-code, tolerance is arange of values around targetUtilization. For example, this a userdefined parameter. In particular, in accordance with one or moreembodiments of the present invention, tolerance is, for example andwithout limitation, ±18%. The steps of the pseudo-code are as follows:

100. For the current on/off configuration (COC), DRS module 72 runs toaddress constraints and perform load-balancing, with powering-on of hostsystems allowed. 101. DRS module 72 calls DPM module 74, whose operationis parameterized by targetUtilization and tolerance. 102. For the COC,for each powered-on host system, for each computational resource, DPMmodule 74 calculates hostUtilization where hostUtilization =demand/capacity and demand = each running VM's desired resources (actualusage + estimate of unsatisfied demand) and capacity = host systemresources for use by VMs. 103. For COC, DPM module 74 calculates thefollowing across each powered-on host system and for each computationalresource: for host systems with hostUtilization > targetUtilization forthe resource: highScore[computational resource] =SQRT(SUM(SQR(hostUtilization- targetUtilization)));highUtil[computational resource] = (any host system hostUtilization >(targetUtilization + tolerance))? true : false; considerHostPowerOn =(highUtil[computation resource] for either resource)? true : false; forhost systems with hostUtilization < targetUtilization for the resource:lowScore[computational resource] =SQRT(SUM(SQR(targetUtilization-hostUtilization))); lowUtil[computationalresource] = (any host system hostUtilization < (targetUtilization-tolerance))? true : false; considerHostPowerOff = (lowUtil[computationalresource] for both resources)? true : false. 104. If considerHostPowerOnthen Consider recommending host system power-on operations else ifconsiderHostPowerOff then Consider recommending host system power-offoperations.

The pseudo-code for powering-on host systems of server cluster 20 is asfollows:

105. Perform steps 100-104, recited above. 106. For next standby hostsystem H, create NOC with host system H powered-on. 107. For NOC, runstep 100, with powering-on host systems disallowed. 108. For NOC, runsteps 102 and 103. 109. If NOC highScore[computational resource] betterthan COC highScore[computational resource], 110. Recommend powering-onhost system H, replace COC with NOC 111. Repeat steps 106-110 whileconsiderHostPowerOn true for COC and more host systems to consider 112.Power-on any additional standby host systems needed to reachmin-powered-on-capacity.

The pseudo-code for powering-off of host systems of server cluster 20 isas follows:

113. Performs steps 100-104. 114. For next powered-on host system H,create NOC with host system H evacuated (if possible) & powered-off 115.Call DPM module 74 host system power-off cost/benefit to evaluate ifpower savings is worth performance risk. 116. For NOC, run step 100,with powering-on host systems disallowed. 117. For NOC, run steps 102and 103. 118. If NOC lowScore[computational resource] better than COClowScore[computational resource]; and 119. If NOChighScore[computational resource] is not worse than COChighScore[computational resource]. 120. Recommend powering-off hostsystem H along with any prerequisite Vmotions, replace COC with NOC.121. Repeat steps 114-120 while considerHostPowerOff true for COC andmore host systems to consider.

To determine the cost/benefit of powering-off a particular host systemof server cluster 20 DPM module 74 compares the risk-adjusted costs ofpower-off with a conservative projection of the power-savings benefit,and rejects the host system power-off unless the benefit exceeds thecost by a configurable factor. The pseudo-code for determining thecost/benefit of powering-off a particular host system is as follows:

122. DPM module 74 host system power-off cost/benefit computes therisk-adjusted costs of power-off of host system H as the sum of: 123.Cost of migrating any running VMs off of the associated host system;124. Loss of the host system's resources during powering-off period;125. Power consumed during powering-off period; 126. Performance loss ifresources become needed to meet demand while host system is off; 127.Loss of host system's resources during its subsequent powering- onperiod; 128. Power consumed during powering-on period; and 129. Cost ofmigrating VMs back onto the host system after it is powered-on. 130.This sum is compared with a conservative projection of power- savingsbenefit obtained by host system power-off. 131. Host system power-off isrejected unless benefit exceeds cost by configurable factor.

With the above embodiments in mind, it should be understood that theinvention can employ various computer-implemented operations involvingdata stored in computer systems. These operations are those requiringphysical manipulation of physical quantities. Any of the operationsdescribed herein that form part of the invention are useful machineoperations. The invention also relates to a device or an apparatus forperforming these operations. In one embodiment, the apparatus can bespecially constructed for the required purpose (e.g. a special purposemachine), or the apparatus can be a general-purpose computer selectivelyactivated or configured by a computer program stored in the computer. Inparticular, various general-purpose machines can be used with computerprograms written in accordance with the teachings herein, or it may bemore convenient to construct a more specialized apparatus to perform therequired operations.

The embodiments of the present invention can also be defined as amachine that transforms data from one state to another state. Thetransformed data can be saved to storage and then manipulated by aprocessor. The processor thus transforms the data from one thing toanother. Still further, the methods can be processed by one or moremachines or processors that can be connected over a network. Themachines can also be virtualized to provide physical access to storageand processing power to one or more users, servers, or clients. Thus,the virtualized system should be considered a machine that can operateas one or more general purpose machines or be configured as a specialpurpose machine. Each machine, or virtual representation of a machine,can transform data from one state or thing to another, and can alsoprocess data, save data to storage, display the result, or communicatethe result to another machine.

The invention can also be embodied as computer readable code on acomputer readable medium. The computer readable medium is any datastorage device that can store data, which can thereafter be read by acomputer system. Examples of the computer readable medium include harddrives, network attached storage (NAS), read-only memory, random-accessmemory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes and other optical andnon-optical data storage devices. The computer readable medium caninclude computer readable tangible medium distributed over anetwork-coupled computer system so that the computer readable code isstored and executed in a distributed fashion.

Although the method operations were described in a specific order, itshould be understood that other housekeeping operations may be performedin between operations, or operations may be adjusted so that they occurat slightly different times, or may be distributed in a system whichallows the occurrence of the processing operations at various intervalsassociated with the processing, as long as the processing of the overlayoperations are performed in the desired way.

Although the foregoing invention has been described in some detail forpurposes of clarity of understanding, it will be apparent that certainchanges and modifications can be practiced within the scope of theappended claims. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

The embodiments of the present invention described above are exemplary.Many changes and modifications may be made to the disclosure recitedabove, while remaining within the scope of the invention. Therefore, thescope of the invention should not be limited by the above description,but instead should be determined with reference to the appended claimsalong with their full scope of equivalents. Additionally, embodiments ofthe present invention may be implemented in software, firmware or as anabstract of a physical computer system known in the art as a virtualmachine or a combination of software, firmware and a virtual machine.With respect to implementing embodiments of the present invention as avirtual machine, expression of such embodiments may be either as virtualsystem hardware, guest system software of the virtual machine or acombination thereof. The scope of the invention should, therefore, belimited not to the above description, but instead should be determinedwith reference to the appended claims along with their full scope ofequivalents.

1. (canceled)
 2. A method comprising: recommending host system power-onof a standby host system when there is a host system in a server clusterof host systems in which utilization is above a target utilization,wherein: recommending the host system power-on includes calculating animpact of powering on the standby host system with respect to reducingthe number of highly-utilized host systems in the server cluster, theimpact of powering on being calculated by simulating moving at least onevirtual machine from at least one highly utilized host system to thestandby host system being recommended to be powered-on; and calculatingthe impact of powering on the standby host system comprises calculatingan amount by which the utilization of a plurality of host systems in theserver cluster exceeds the target utilization.
 3. The method of claim 2,further comprising: determining the utilization of each host system inthe server cluster as a ratio of demand to capacity for that hostsystem.
 4. The method of claim 2, wherein recommending the host systempower-on comprises including iterating through standby host systems, andfor each respective standby host system, invoking a software modulesupporting virtual machine resource constraints and quantifying theimpact of powering on the respective standby host system.
 5. The methodof claim 2, wherein calculating the impact of powering on is repeatedfor each standby host system in the server cluster to determine whetherthat standby host system should be recommended to be powered-on.
 6. Themethod of claim 2, further comprising: calculating an impact of poweringoff a host system within the server cluster by calculating an amount bywhich the utilization of a plurality of host systems in the servercluster is below the target utilization.
 7. The method of claim 6,wherein calculating the impact of powering off is repeated for eachpowered-on host system in the server cluster to determine whether thatpowered-on host system should be recommended to be powered-off.
 8. Themethod of claim 2, wherein recommending host system power-off includescalculating host power-off cost, wherein the host power-off cost isbased upon assessing at least one of: a loss of the host system'sresources during power-down, power consumed during a power-down period,a loss of the host system's resources during a subsequent power-onoperation, power consumed during a power-up period, or costs ofmigrating virtual machines back onto the host system.
 9. Anon-transitory computer-readable medium embodying computer instructionsexecutable by a computing device, the computer instructions beingconfigured to cause the computing device to at least: recommend hostsystem power-on of a standby host system when there is a host system ina server cluster of host systems in which utilization is above a targetutilization, wherein: recommending the host system power-on includescalculating an impact of powering on the standby host system withrespect to reducing the number of highly-utilized host systems in theserver cluster, the impact of powering on being calculated by simulatingmoving at least one virtual machine from at least one highly utilizedhost system to the standby host system being recommended to bepowered-on; and calculating the impact of powering on the standby hostsystem comprises calculating an amount by which the utilization of aplurality of host systems in the server cluster exceeds the targetutilization.
 10. The non-transitory computer-readable medium of claim 9,wherein the computer instructions are further configured to cause thecomputing device to at least: determine the utilization of each hostsystem in the server cluster as a ratio of demand to capacity for thathost system.
 11. The non-transitory computer-readable medium of claim 9,wherein recommending the host system power-on comprises includingiterating through standby host systems, and for each respective standbyhost system, invoking a software module supporting virtual machineresource constraints and quantifying the impact of powering on therespective standby host system.
 12. The non-transitory computer-readablestorage medium of claim 9, wherein calculating the impact is repeatedfor each standby host system in the server cluster to determine whetherthat standby host system should be recommended to be powered-on.
 13. Thenon-transitory computer-readable storage medium of claim 9, wherein thecomputer instructions are further configured to cause the computingdevice to at least: calculate the impact of powering off a host systemwithin the server cluster by calculating an amount by which theutilization of a plurality of host systems in the server cluster isbelow the target utilization.
 14. The non-transitory computer-readablestorage medium of claim 13, wherein calculating the impact of poweringoff is repeated for each powered-on host system in the server cluster todetermine whether that powered-on host system should be recommended tobe powered-off.
 15. The non-transitory computer-readable storage mediumof claim 9, wherein recommending host system power-off includescalculating host power-off cost, wherein factors involved in calculatingthe host system power-off cost include one or more of a loss of the hostsystem's resources during power-down, power consumed during a power-downperiod, a loss of the host system's resources during a subsequentpower-on operation, power consumed during a power-up period, and costsof migrating virtual machines back onto the host system.
 16. A systemcomprising: a computing device; and an application executable by thecomputing device, wherein the application, when executed by thecomputing device, is configured to cause the computing device to atleast: recommend host system power-on of a standby host system whenthere is a host system in a server cluster of host systems in whichutilization is above a target utilization, wherein: recommending thehost system power-on includes calculating an impact of powering on thestandby host system with respect to reducing the number ofhighly-utilized host systems in the server cluster, the impact ofpowering on being calculated by simulating moving at least one virtualmachine from at least one highly utilized host system to the standbyhost system being recommended to be powered-on; and calculating theimpact of powering on the standby host system comprises calculating anamount by which the utilization of a plurality of host systems in theserver cluster exceeds the target utilization.
 17. The system of claim16, wherein the application causes the computing device to at least:determine the utilization of each host system in the server cluster as aratio of demand to capacity for that host system.
 18. The system ofclaim 16, wherein recommending the host system power-on comprisesincluding iterating through standby host systems, and for eachrespective standby host system, invoking a software module supportingvirtual machine resource constraints and quantifying an impact ofpowering on the respective standby host system.
 19. The system of claim16, wherein calculating the impact is repeated for each standby hostsystem in the server cluster to determine whether the standby hostsystem should be recommended to be powered-on.
 20. The system of claim16, wherein the application causes the computing device to at least:calculate an impact of powering off a host system within the servercluster by calculating an amount by which the utilization of a pluralityof host systems in the server cluster is below the target utilization.21. The system of claim 16, wherein calculating the impact of poweringoff is repeated for each powered-on host system in the server cluster todetermine whether the powered-on host system should be recommended to bepowered-off.